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When public pressure mounts for teacher 
accountability, current Methods of evaluating teachers are videly 
regarded as inadequate. Teachers often feel that evaluation is hasty, 
arbitrary, and threatening; more iaportant, it gives them little 
practical help in improving their performance. This paper describes a 
pilot test of a new collegial evaluation program that emphasizes the 
improvement of classroom teaching. Working in psars, teachers select 
their own criteria, observe each other in the classroom, give each 
other feedback, and develop plans for improvement. The program also 
provides for self -assessment and assessment by students to be 
incorporated into the overall evaluation. Thirty teachers and teacher 
trainees drawn from a variety of teaching situations participated in 
the pilot test. On the whole, the results were promising^ teachers 
reacted favorable ro collegial evaluation; they were able to adapt 
the program to their own needs when necessary; and they gained new 
ideas for improvement from it. (Author) 
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Introductory Statenent 



The ttission of the Stanford Center for Research and Development 
in TeacMng is to improve teaching in i^rican schools. Current sajor 
operations indxide three research and developaent prograns — ^Teaching 
Effectiveness^ The Environaent for Teaching, and Teaching and Linguistic 
Fluralisa — and two prograas combining research and technical assistance^ 
the Stanford Urban/Kural Leadership Training Institute and the Hoover/ 
Stanford Teacher Corps Project. The ERIC Clearinghouse on Information 
Resources is also a part of the Center* A program of exploratory and re- 
lated studies provides for smaller studies not part of the major programs. 

This report describes the experiences of teachers and interns who 
participated in a program to improve teaching through coUegial evalua- 
tion. Both the evaluation program and pilot test reported here were 
dei;igned and carried out by the Environment for Teaching Program. A 
manual that provides step-by-step directions for ii^lementlng the col- 
legial evaluation program is in preparation. 
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Abstract 



While public pressure aounts for teacher accouctabxllty, current 
siethods of evaluating teachers are widely regarded as inadequate. 
Teachers often feel that evaluation is hasty, arbitrary, and threaten- 
ing; nora ii^ortant, it gives thea little practical help in improving 
their performance. 

This paper describes a pilot test of a new collegial evaluation 
progran that eophasizes the iaprovesent of classroom teaching. Work- 
ing in pairs, teachers select their own criteria, observe each other 
in the classroon, give each other feedback, and develop plans for 
izoproveaent. The prograa also provides for self-assessaent and assess- 
ment by students to be incorporated ±Gto the overall evaluation. 

Thirty teachers and teacher trainees drawn f roa a variety of 
teaching situations participated in the pilot test. On the whole, 
the results were promising: teachers reacted favorably to collegial 
evaluation; they were able to adapt the program to their own needs 
when necessary; and they gained new ideas for improvement from it. 
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A PIWT TEST OF COLLEGIAL EVALUATION FOR TEACHEHS 



Susan Stavert Roper, Terrence E. Deal, and Sanford M. Dombusch 

Schoolteachers and administrators alike are experiencing pressures 
to iaprove classroom teaching as the result of a general novement toward 
greater "accountability" in our educational system. In formulating pol- 
icies to institute accountability, state legislators have assuzaed that 
► required evaluation procedures will automatically result in better 
teaciMng. Unfortunately, nearly everyone in the educational field 
agrees that the evaluation of teachers is poorly done and gives teachers 
little practical help in improving their pfcrformance. But here the 
agreement ends. Some feel that improving principals' skills in evaluat- 
ing teaching performances is the answer. Others, feeling that the 
principal is overworked, would bring in outside evaluators to inspect 
classroom teaching. Still others would shift the focus from teaching 
performances to educational outcomes measured through achievement tests 
or behavioral objectives. 

In the last decade we have studied evaluation processes in many 
different organizations, including not only schools but an assembly 
line, a physics research team, hospitals, a Roman Catholic archdiocese, 
university faculties, a student newspaper, and even a football team. 
From these studies we have developed a general model of evaluation that 
consists of six steps: (a) assigning goals, (b) sejiting criteria or 
standards, (c) making observations (sampling performance), (d) apprais- 
ing performance, (e) communicating appraisals (providing feedback), (f) 
planning a program for improvement. These steps are interdependent; a 
weakness in any one lessens the contribution that evaluation can make to 
improving job performance (Dombusch and Scott, 1975). 

In our educational research we have gathered information on evalua- 
tion processes from 600 teachers and 33 administrators. This informa- 
tion was sufficient to convince us that weaknesses in one, two, or even 
all six evaluation steps were common in schools. For example, in one 

A shorter version of this paper will appear as "Collegial Evaluation: 
Does It Work?" in Educational Research Quarterly . Spring 1976. 
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study that was part of this research^ about half of the teachers report- 
ed that they did not know vhat criteria were used to evaluate th»> or 
that the evaluation criteria vera too vague to be meaningful (Ihoo^son, 
Dombusch, and Scott, 1975). Ihey co2!5)lained that observations of their 
classrooa teaching often amounted to no aore than infrequent quick peeks 
into the classroom by the principal. It is not surprising, then, that 
teachers are very anxious about being evaluated and do not believe that 
evaluation helps them iaprove their teaching. 

But principals are not necessarily to blame for tliese shortcomings. 
Often they are too busy with administrative duties to spend adequate 
time observing teachers or providing them with useful feedback. Also, 
because of their formal supervisory position, their evaluations often 
seen threatening to teachers. And certainly the sense of threat has 
been amplified by the punitive Implications of accountability legisla- 
tion. As a consequence, many administrators feel more comfortable 
using students' test scores as an indirect means of assessing teacher 
performance. 

Some recent evaluation programs stress the importance of student 
learning as an indicator of successful classroom teaching. But the 
emphasis on student outcomes creates problems of its own. For one thing, 
student variability is often so great that student outcomes may tell us 
less about teachers than about students, or even evaluators. More impor- 
tant, teachers have justifiably asked: How will these results help us 
improve our teaching? As the sports cliche goes, "Knowing the score 
doesn't help the team improve for the next game." In sum, evaluation 
programs that rely on the principal as the sole evaluator or on student 
outcomes as a means of assessing teacher quality will do little to improve 
teaching. 

Our investigation has convinced us that at a minimum, an evaluation 
program aimed at i2i;>roving classroom teaching must have three character- 
istics: (1) it must not have punitive implications; (2) it must desig- 
nate evaluators to supplement the principal's evaluation; and (3) it 
must focus on teaching performance. In this paper we propose collegial 
evaluation as a strategy for satisfying these three criteria. 
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Collegia! evaluation is advocated in aany professional organizations 
as a neans of both znaintaining standards and improving performance. In 
fact, professionals derive nuch of their status 1 ^ the fact that they 
have foriaally assumed responsibility for evaluating one another through 
their occupational associations. Clearly, a program in vhich teachers 
evaluate one another poses some difficulties. Teachers have had alinost 
no experience in fonaally evaluating teaching, and often they consider 
their classroom a private doi^in, out of bounds to others except the 
principal. Despite these difficulties, our studies of open-space class- 
rooxns and team teaching revealed that reducing the isolation of classroom 
teachers had unexpected positive results. Since teaching performance was 
more visible under these conditions than in the traditional classroom, 
tsachers viewed evaluation of their teaching by colleagues as more legit- 
imate. As these evaluations were exchanged, teachers developed more 
respect for the ability of their colleagues to make sound judgeaents. 
As a result, they vere more willing to have colleagues evaluate their 
teaching (Marram, Dombusch, and Scott, 1972), From these findings, we 
reasoned that if teachers were given the opportunity to observe one another, 
they could give each other useful feedback. They would thus become niore 
willing to evaluate each other in the future and would be able to use 
these regular evaluations to improve their classroom teaching. Of course, 
not all teachers will be happy with collegia! evaluation. But since 
teachers are so disenchanted with the present hierarchical structure of 
evaluation, they may be receptive to a new approach in which they conduct 
their own evaluations. 

We have developed a program of collegia! evaluation for teachers that 
emphasizes evaluation as a means to improve classroom teaching. Since 
the spring of 1974 over 150 teachers have been introduced to the program 
through workshops. Thirty teachers participated in the pilot test. The 
teachers worked in pairs, selecting criteria, observing each other, pro- 
viding feedback based on their observations, and helping one another 
develop specific plans to improve their teaching. In addition, teachers 
administered a student questionnaire and completed two types of self- 
assessment to gain more information to integrate into their improvement plans. 
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The xesults of this test suggest that our collegial evaluation program 
can help both experienced teachers and teacher trainees ii!q>rove their 
teaching. This memorandum describes the experiences teachers had in each 
step of the collegial evaluation program in the hope that other tctachers 
and administrators viU be encouraged to give collegial evaluation a try. 

The Collegial Evaluation Program 

As we have mentioned, in our collegial evaluation program teachers vork 
in evaluation partnerships to improve the quality of their teaching. The 
prcgraa also provides for self-assessment and student assessment. The entire 
sequence of self— and student assessment, observations, and conferences 
requires ten to twelve hours spread over a month or two. The program is 
flexible and can be impleiacnted by an entire faculty, a department, a teach- 
ing team, or any two interested teachers. Ke are presently preparing a 
nanual for teachers containing all the directions and forms needed to im- 
plement the program. In addition, the manual will explain the rationale for 
each step, incorporate examples of successful practices from other teachers, 
and offer suggestions to help teachers get the greatest benefit from this 
experience. The collegial evaluation program consists of seven interrelated 
steps: 

1. Ch oosing a partner . This partnership between two teaching colleagues 
is the heart of the program of professional development. 

2. Selecting evaluation criteria . In some schools teaching standards 
have been defined specifically enough to serve as a guide for 
evaluation. We help by providing -examples of criteria used by 
other teachers. 

3. Self-assessment . Each teacher completes a self-evaluation form, 
which is based partially on the criteria selected as well as on a 
questionnaire given to students. 

4. Student assessment * A questionnaire is provided to get Important 
feedback from students. 

5. Observations . Observing a colleague's teaching and being observed 
in turn is the crucial step. We have developed forms for making 
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obscrvations based on the selected criteria and tips to improve 
observations! skills. 

6- Conferen ce on observations . After each set of observations, the 
collegia! pair holds a conference. The purpose of the conference 
is to report observations and develop plans for improvement in 
appropriate areas. The program specifies a structure for the 
conference so that each teacher knows how to proceed. 

7- The impr ovement plan . A final conference is held to pull together 
observations by colleagues, self-assessments, and stiident assess- 
ments. Once again, the structure of the conference is specified 
and some suggestions are provided for developing a long-term 
program for improvement. 

The Pilot Test 

; 

Our pilot test was designed to serve several purposes: (1) finding 
out how teachers would react to collegia! evaluation; (2) helping to 
identify the unforeseen problems that inevitably arise in any new venture; 
(3) discovering whether or not teachers would adapt the program to fit 
their own needs, and if so, in what ways; and (4) helping us improve the 
program. Of course, the critical question was. Will the collegia! evalua- 
tion program work? 

Although the pilot test sample included only 30 teachers, they were 
deliberately drawn from a variety of teaching situations. The situations 
included (1) different subject areas, (2) different grade levels (K-12), 
(3) suburban and inner-city schools, (4) open-space and self-contained 
classrooms, and (5) teachers with varying levels of experience, including 
both teacher interns and credent ialed teachers. 

We worked with two groups of teachers: teachers in an elementary school 
serving a California suburban community, and teacher trainees in the 
Stan^'ord University teacher intern program. The interns were assigned to 
junior high schools and high schools on the San Francisco Peninsula and in 
San Jose, California. They were teaching numerous subjects including 
natural sciences, social studies, English, art, music, physical education. 
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and languages. Some were working in upper-aiddle-class schools, and others 
in predominately Black or Chicano inner-city schools. The elementary 
teachers worked in a school that has been architecturally designed to permit 
alternative instructional approaches in its various open-space "pods". The 
evaluation partners in this school were members of the same teaching team. 
The secondary teachers, all interns, had only a few months of teaching 
experience, while the elementary teachers varied in experience from two 
years to over fifteen. In sun, the diversity of teach&rs, students, and 
settings in the pilot test allcvjed us to determine whether the collegial 
evaluation program would work across a variety of different situations. 

These teachers participated in all stages of the collegial evaluation 
program* Their experiences with various aspects of the program are sum- 
marized below, along with their comments and suggestions. 

Choosing a Partner 

As it happened, partnerships were formed quite differently among 
the elementary school teachers and the teacher trainees. The teacher 
trainees were free to choose their own partners, and they xasually did so 
by common agreement on the basis of friendship or proximity. In the 
elementary school the principal assigned partners, and all of the partners 
had worked together previously in open-space classroom teams. The teachers 
were generally satisfied with the principal's assignment, but most teachers 
as well as interns felt that collegial evaluation participants should be 
able to select their own partners. All were skeptical of random selection 
as well. 

There was some disagreement, however, about the criteria that should 
be used in selecting partners. Some stressed previous friendship. As 
one teacher said, "Working with a fellow teacher on this program required 
a lot of respect and trust. You've got to really like one another. It's 
almost like a marriage — only if you like someone can you be honest." 
Others emphasized the importance of choosing a partner from the same 
subject area and/or grade level to maximize the relevance of feedback. 
One intern, however, argued that she learned a great deal from observing 
a colleague in another subject area. Her field was English; her colleague 
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vas in biology. She said that she vas InDCdiately iazpressed by the niaaber 
and variety of zsaterials available to students in the biology lab and 
realized for the first tine hov aeager her English classroom aaterials vere. 

Some interns felt that teaching e^jperience should be a critical factor 
in the selection of partners. They felt that a more e3q)erienced teacher 
would not take their criticisms or suggestions seriously, and that they 
vould be hesitant to express their own fears of inadequacy to an "old pro." 
Other interns reported, to the contrary, that their partnership was limited 
by an insufficient experience base from which to generate "well-seasoned" 
suggestions for improving teaching. By contrast, the elementary teachers 
did not even mention teaching experience as a factor in the selection of 
partners. Levels of experience did not affect the quality of feedback or 
mutual respect within the elementary school group. 

Both the interns and the elementary teachers agreed that partners 
should share a similar educational philosophy. The interns vere partic- 
ularly adamant on this point, maintaining that they tended to ignore crit- 
icism from someone whose views were radically different from their own. 
An illustration of the importance of educational philosophy came from a 
pair of interns who realized after observing each other that they both 
needed to become more directive and firm with their students- Although 
their supervisors had previously mentioned that they were losing control 
of the class, the interns had attributed this criticism to a philosophical 
conflict between them and their supervisors. They therefore made no attempt 
to change their teaching behavior. However, \then they received similar 
feedback from someone they regarded as more sympathetic to their views, 
they were willing to take the criticism more seriously. Both wanted an 
"open," "trusting" classroom environment, but what they saw in their 
respective observations was "chaos." As one of the partners said: 

We have a philosophical stance that makes each 
of us uncomfortable i/ith the role of "authority" 
figure in our classrocrzrs. We are both searching for 
ways to make learning happen without crushing spirits, 
damaging self-concepts, or belittling individuals. 
It is our shared ideals that make it easy for me to 
accept Bill's criticism — I cannot reject it as being 
in disagreement with my fundamental beliefs. 
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Above all, the critical element in successful coUeglal evaluation 
is autual respect. All participants agreed that respect for their 
partner's ability was laore important than friendship, teaching experience, 
philosophy, or subject area as a basis for a successful collegial 
relationship* 

Selecting Evaluation Criteria 

Ihe process of selecting evaluation criteria consists of five steps: 
(1) the two teachers identify the pool of possible criteria using such 
sources as school goals, accountability guidelines, recent research, and 
their own philosophy; (2) each teacher aakes a list of four or five 
criteria and exchanges lists with his or her partner; (3) the two teachers 
agree on a list of four or five criteria; (4) the two teachers review 
the list to make sure each criterion is specific and observable; and (5) 
the criteria are listed on the observation form. 

According to both the interns and the elementary teachers, selecting 
criteria was clearly the laost difficult step in the collegial evaluation 
process* The main problem was developing criteria that were specific 
enough to be observable but still significant enough to reflect inportant 
aspects of teaching performance. For example, the criterion "ability to 
write clearly on the blackboard" would have been specific, but teachers 
were more interested in focusing on broader areas such as rapport witli 
students and ability to motivate students. Criteria for observing 
rapport and ability to motivate were much more difficult for them to 
develop* 

Teachers also reported difficulty in selecting criteria that cotild 
be applied to the actual situations in which observation took place. 
Some teachers had difficulty because their criteria were appropriate 
for a different instructional activity than the one they observed. 
Others selected criteria that focused on too many activities simulta- 
neously. 

Partners usually were able to agree on a list of criteria. Although 
the program provides an option hereby the two Individuals may use 
completely different criteria, no one took this option in the pilot test* 
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As the partners developed their list, the oajor source of disagreement 
vas the effort to define the criteria in specific, observable teras. As 
noted, part of the problem vas caused by vague and anbiguous criteria 
that did not lend thezsselves to clear defixation or observation. For 
example, teachers vho selected "uses commmication skills effectively," 
"degree of engagement by students," or "response of class to lesson" spent 
much of their aaeeting trying to decide what they meant by "cosmunication 
skills," "engagement," or "response." 

Although trying to select ia^ortant yet observable criteria was dif- 
ficult, teachers did not find the task boring or unproductive. As one 
said, "Selecting useful criteria forced ae to clarify ay own educational 
philosophy. 1 had to decide what was really important to lae and then try 
to opera tionalize my goals so they could be observed-" 

Some of the better criteria were specific to certain subject matter. 
Tor example, two physical education teachers agreed thct ensuring the 
physical safety of students was an important criterion for successfully 
teaching a tumbling lesson- During the lesson the observing teacher 
noted that although the tumbling mats had been carefully arranged, several 
students had not tied back their hair and were chewing gum during the 
practice session— both violations of safety rules- Similarly, two music 
teadiers were able to give eadi other excellent feedback using criteria 
such as "time limit per piece," "explanation of the warm-up period," "pre- 
sentation of rehearsal objectives," and "discussion of stops made during 
rehearsal* " 

Also useful were criteria that focused attention on specific mannerisms 
or behavior patterns of teachers. Watching for "any distracting speech 
mannerisms or gestures," a teacher discovered that her partner ended 
almost every sentence with a tentative "OK?" Until the first conference 
the teacher was totally unaware that she had this disturbing habit- One 
pair who taught in an open-area classroom listed "teacher mobility around 
the pod" as a criterion- Their observations revealed that one teacher 
was constantly moving around the classroom while the other was not moving 
enough to supervise students adequately- 
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Many teachers had difficulty aaldng the leap from Identifying a 
general area for observation and potential iaproveaent to developing criteria 
that could be used to assess teaching perfozaance in that area. But others 
were quite successful in developing appropriate criteria. For exanple, sone 
teachers selected "ssotivates students to participate in discussion" as a 
general area and cane up vith "nunber of tines teacher responded with a 
positive statement," "nuzsber of negative concents hy teacher," "average 
length of tine teacher vaited for an answer," "nmber of students vho vere 
called on to ansver a question," and "terns teacher used to praise or sanction 
students" for specific criteria. These teachers vere able to learn about 
sone of their specific behaviors that enhanced or haiq>ered student motiva- 
tion. For exanple, one teacher learned that the words she used to praise 
students were too dranatic when her partner observed that students disnlssed 
her praise as unrealistic. She vowed to delete "fantastic" and "terrific" 
frOTi her vocabulary except for "truly fantastic" responses. Another teacher 
learned that she habitually called only on students seated in the front rows. 
She decided to rotate seating positions weekly to enhance the opportunity of 
all students to participate. Overall, teachers recomnended that in develop- 
ing criteria the two partners should discuss how they would neasure and 
observe a criterion before agreeing to use it. 

There was sooe debate among teachers about the extent to which the 
criteria should reflect areas of potential weakness rather than areas of 
strength. Most agreed that potential weaknesses should be the basis for 
at least some of the criteria, since the purpose of collegial evaluation 
is improvement, not just reinforcement. The process of identifying areas 
of teaching weakness was aore difficult for the elementary school teachers 
than for the interns. One. teacher suggested that early observation of a 
teaclier reputed to be exemplary might help in identifying one's own 
problem areas. It would be helpful, this teacher said, to have an opportu- 
nity tcr compare your own teaching performance with that of another teacher 
prior tc deciding on criteria. 

An extremely productive technique for generating criteria was to 
distribute the student questionnaire before selecting criteria. Our col- 
legial evaluation manual will be revised to incorporate this finding. Some 
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teachers distributed the student questionnaire before aneetiag vith their 
partner to select criteria. When one pair found that znany students did 
not understand a teacher's directions, "clarity of directions" becaiue 
the first criterion on their list. 

In sunoary, teachers in the pilot test suggested that the evaluation 
partners decide together how they would observe a criterion before in- 
cluding it on their list. In selecting criteria they learned that vague, 
ajiiiguous, and global terms vere not useful guidelines for observation. 
Criteria related to a particular subject or grade level vere often help- 
ful. Teachers found that the responses from the student questionnaire 
helped then generate criteria. .*ithough they agreed that selecting cri- 
teria vas the nost difficult step in the collegial evaluation program, 
they thought it was worth the effort. Selecting criteria helped them 
decide not only where they needed to i2r5)rove but what was snost ijsportBnt 
to them as educators. 

Observations 

All teachers in the pilot test observed one another for two classroom 
periods- the observation tine the program requires. Soae participants 
felt that two periods did not provide enough time for cbservation. However, 
often these teachers had selected too many criteria or had selected cri- 
teria that were not applicable to the classroom situation they observed. 
More observations are certainly desirable, but since the program is de- 
signed to minimize inconvenience to teachers and administrators, it is 
preferable to take steps to make two observation periods sufficient. 
Selecting criteria appropriate for the classroom session is one example. 

A number of interns selected different classes and/or subject areas 
for their second observation in order to learn whether their strengths 
and weaknesses were the sane across different subjects and classrooms. 

Teachers in the pilot tests reported that they learned as much from 
observing as from being observed. Many related a host of new teaching 
techniques acquired in their role as observer. In one open-space class- 
room the teachers switched students for observ uion. One teacher observed 
the other teaching a lesson to her students and vice versa. 
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Both teachers r^ortedly leajoicd from this trade-off. One renarked that 
she had act realized a certain group of students never participated in 
her class imtil she sat at the bade and vatched them being taught by 
someone else* 

The interns also reported benefits from observing. Under normal cir— 
cumsLenceSy interns rarely have the opportunity to see 3xiyone teach a 
class other than their master teacher. As one inters put it, "By seeing 
other interns you gee to see yourself with regard to your peer group — ^it 
is reassuring to kncv that you are not the only one making mistakes." 

All participants liked the escposure to ether methods of instruction 
and teaching styles. Teachers rarely have a chance to observe one another 
teaching — particularly if they are fa self-contained classrooms. But even 
the teachers in the open-space school said that under usual conditions, 
they vere too busy to observe their teammate adequately. CoUegial eval- 
uation gave them the chance not only to observe but to focus their obser- 
vation using specific criteria. 

The o ^Jf ty of feedback exchanged in the conferences vas largely 
dependent - . 3 quality of observations. The best observers were those 
guided by a ftw specific criteria that were appropriate to the particular 
acc^ ^ wbs<?-ved. They learned sore from their observations and 

were better able to offer their partner concrete and useful information* 

Conferences 

Conferences require the ability to give constructive criticism with- 
out damaging egos or destroying long-term relationships. As our collegial 
evaluation program specifies, teachers in the pilot test exchanged feed- 
back on three occasions: after each of the observation x>eriods and at 
the wrap-up conference. In addition, they rated their strengths or 
weaknesses for each of the shared criteria on the self-evaluation form, 
which is similar to the observation form, making it easy to compare the 
two evaluations. In every case, participants were harder on themselves 
than their colleagues were. 
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The interns vere asuch more willing than the elementary teachers to 
give lew ratings to their colleagues and to give critical feedback on the 
observation form* Interns, by definition, are "people learning the skills 
of teaching," while certificated teachers (theoretically at least) already 
possess these skills. From this perspective, it is not surprising that 
interns were more comfortable offering vritten criticism than the elemen- 
tary school teachers • During the conferences, however, teachers exchanged 
criticism and did more than pat one another on the back* Although they 
were reluctant to write down their negative comments, they were usually 
quite candid in their conferences. 

An important purpose of the conferences is to develop specific strategies 
for improvement- Since the eleirentary school teachers worked together in 
the same classroom area, many of them identified problems that could be 
worked on cooperatively. For example, one pair agreed that the noise level 
in their area was occasionally too high and they discussed how, as members 
of a team, they could create a quieter learning atmosphere- Because these 
teachers worked together, they were motivated to help each other — to give 
feedback that would improve not only their individual teaching performance 
but the overall atmosphere of their classroom- 
One teacher pointed out that a major difference between criticism during 
collegial evaluation and evaluations by an administrator was "the way crit- 
icism was phrased-" We were continually impressed by the tact and diplomacy 
exhibited in the conferences. Criticisms were frequently presented as 
suggestions for alternative techniques. In one teacher's words, "Instead 
of having someone say, 'you should do this', a colleague was more likely 
to say, 'something that worked well for me was this technique-'" This 
approach not only was less threatening but was perceived as more legitimate- 
If the technique worked for a colleague, it was worth a try- 

The interns' conferences emphasized diagnosis rather than specific 
recommendations- They spent more time and effort analyzing teaching strengths 
and weaknesses than the elementary school teachers did. Perhaps because 
of their relative inexperience, they did not have as many concrete suggest- 
ions to offer one another and instead demoted some time at each conference 
to brainstorming alternative teaching strategies. 
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CoUegial evaluation provided positive reinforceiaent as well as con- 
stractive criticism. Suggestions for ii!5)roveifient were balanced with praise 
for effective teaching. Praise seemed to fill a very great need. As one 
teacher said, "When your colleague praises you, it meaiJ^ so anuch." Praise 
improves teaching by reinforcing successful practices, thus encouraging 
their frequent use. In school, teachers rarely receive praise from their 
colleagues because they are not observed or evaluated by then. Though the 
value of positive reinforcement in notivating pupils is universally recog- 
nized, this practice has seldom been extended to teachers — in spite of 
the fact that the importance of teachers' job satisfaction and faculty 
morale has long been recognized by teachers and administrators alike. 

The feedback given in the conferences encompassed virtually every aspect 
of classroom activity. Teachers learned not only about their own perform- 
ance but about the overall^ climate of their classroom. For example^ one 
intern noted, "There was a warm, cooperative atmosphere in this classroom. 
It was created by allowing student work groups to sit together on pillows 
on the floor and emphasizing the importance of group evaluation for the 
task." Another intern summarized his feeling for a class by telling his 
partner, "People are noisy; that doesn't bother me* They are talking, 
getting excited, and having fun." On a more critical note, an art intern 
told his partner that clean-up period was "utter chaos" and suggested that 
students be assigned responsibilities for cleaning up after themselves. 

Teachers also reported learning more about the behavior of particular 
students. One observer said of a self-directed project, "The autonomous 
kids go directly to work, but those who need a lot of teacher direction 
and support are left out." During a classroom discussion session, another 
observer noted, "While most students seem to be involved, a few appear to 
be untouched by the discussion." And during a lecture presentation another 
observer said, "A couple of students did not understand; they needed ex- 
tensive clarification." These comments became catalysts for discussion in 
the conference. The observed teacher wanted to know which students were 
not autonomous, which were untouched by the discussion, and which needed 
further clarification. The partners then discussed ways to overcome these 
problems. 
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Some of the observations focused on problems of classroom discipline. 
Classroom control was more frequently discustfed in conferences by interns 
than by teachers. Throughout the evaluation processes interns helped one 
another identify which students were creating problems and what siiight be 
done to iaprove classrooa order. For exaziople, one intern learned that 
"a small group of boys in the back are goofing off." Following the con- 
ference this small group was broken up and dispersed throughout the class- 
roon. 

After specific discipline problems had been openly discussed in the 
conferences, both interns and teachers often took steps to solve them. 
Overlooking a particularly noisy student is difficult when a colleague has 
identified the problem through systematic evaluation and provided a just- 
ification for action. For example, many interns reported a reluctance to 
openly chastise their students. They feared that any display of authority 
would squash independence or creativity, or perhaps more important that it 
would jeopardize their students' affection for them. But when a colleague 
says that a certain student is testing the limits of tolerance (and what's 
more, that the same student creates a similar problem in his or her own 
classroom), a teacher feels more justified in trying to find sound teaching 
techniques to bring that student into line. 

Understandably, much of the feedback exchanged during conferences 
focused on the teacher's behavior in the classroom. Some discussions were 
directed at subject-matter presentation. Teachers gave each other useful 
information about the quality of materials used in lessons, the appropriate- 
ness of the language used in classroom presentations, the clarity of object- 
ives and direction, and specific techniques for making their lessons more 
interesting. These comments ranged from general observations, such as 
"The material is going over the kids' heads," to more specific one, such 
as "Your explanation of chromatic half steps was a little complicated." 
Similarly, the suggestions fpr improvement ranged from general ones 
concerning the teacher's overall performance, such as "You should take at 
least a half hour to present material you are now covering in ten minutes," 
to very specific ones, such as "Why not give each student a copy of the key- 
board to follow along during your explanation of chromatic half steps?" 
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The conferences also provided a forum for discussing teacher-student 
interaction, which was a matter of great concern to the participants, 
judging by both the criteria they chose for observing and the feedback th^ 
gave during conferences* A cozsmon observation was that a certain student 
or group of students was ignored. Many teachers wanted feedback concerning 
whether they used eye contact with everyone in their room, whether they 
called on different pupils rather than continually selecting the saiae ones, 
and whether they gave equal attention to students. One teacher learned 
that though she was successful in finding occasions to talk with all of 
her students individually about their art projects, most of her remarks 
were negative. In the conference her partner suggested that "students should 
get more reinforcement on the positive aspects of their work." Teachers 
continually praised one another for using positive reinforcement.^ As one 
said, -'You gave lots of 'warm fuzzies' this morning and it meant a lot to 
the kids." 

On a more procedural note, participants found that holding conferences 
no more than two or three days after observations improved the quality of 
feedback. Similarly, the observation form (where ratings and comments on 
the colleague's performance are written) was more useful if it was completed 
immediately after observing. But most important, teachers reported that 
the quality of their conferences ultimately depended on the willingness of 
the partners to be reasonably honest with one another. 



Teachers rarely told one another to be more critical of their, students' 
work or to develop higher expectations for their students, either individ- 
ually or as a class. They seemed to believe that each student should 
receive a lot of teacher warmth and approval regardless of his academic 
performance. We believe that this approach has serious flaws. Other 
research shows that students develop greatly inflated opinions of their 
academic skills in classrooms characterized by strong and uncritical 
teacher approval. Overstressing warmth and praise may have negative con- 
sequences, since it can lead students to have totally unwarranted beliefs 
about their academic skills. G.C. Massey, M.V. Scott, and S.M. Dornbusch, 
Racism without Racists; Institutic-tal Racism in Urban Schools , Occasional 
Paper No. 8 (Stanford, Ca: Stanford Center for Research and Development 
in Teaching, 1975), pp. 7-10. Reprinted from The Black Scholar , 7,. No. 3 
.(November 1975), pp. 10-19. 
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Self-Assessnent and Student Questionnaire 

Following the structure of our collegial evaluation program, several 
of those who participated in the pilot test distributed the student 
questionnaire to their classes and coii5)leted the self-assessment form as 
part of the evaluation process. The teacher questionnaire contains items 
parallel to the student questionnaire. These rllow teachers to identify 
slxnllarlties and differences in their perce ^tions of themselves and their 
students' perceptions. For exaople, the teacher responds to the question, 
"How often do you encourage students to ask questions when they don't 
understand what's going on?" Students answer the similar question, 
"When you don't understand what's going on In this class, hew often are 
you encouraged to ask questions?" Like the teacher, students use a five- 
point scale which ranges (for this question) from "always*^ to "never." 
After combining the student responses and computing a classroom average, 
the teacher can discover the level of agreesient between his self- 
assessment and his students' assessment. Moreover, by looking at the dis- 
tribution of responses, a teacher might find that some students "never" 
feel encouraged to ask questions, even though most students "«jsually" do. 
Both the classroom average and the distribution thus provide interesting 
and useful kinds of information. 

The contribution of these questionnaires to the evaluation process 
was snimnarized by one teacher: 

1 believe that the student questionnaire was extremely 
valuable In providing Information that I myself or a third person 
could not possibly provide adequately or accurately. The specific 
kinds of questions deal with those problems that cannot be readily 
observed. They focus on those students' personal and academic needs 
that are basic to learning. 

One of the most striking results of the pilot test was the high level 
of agreement between teachers and students as shown by responses on their 
questionnaires. This similarity was not anticipated by the teachers. 
One teacher remarked, "I was very surprised to find that my own percep- 
tlons agreed fourteen out of twenty-one times (over 66%) with the average 
of the students. I think this proved that even though my class may not 
be the greatest one In the world, my students and I certainly agree on 
what it Is." Another teacher said, "The questionnaires Indicate that I 
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have a realistic understanding of students* feelings toward the class 
and isyself as a teacher." 

Despite the general agreement, there were several items on the 
questionnaire that produced substantial disagreement between teachers 
and their students. These findings raised new questions and prompted 
teachers to investigate the underlying reasons for the discrepancy. For 
example, one teacher was surprised to find that on the average her students 
felt classwork was "usually" too fast and difficult. Her first inter- 
pretation was that she had overestimated her students' abilities. After 
looking more closely at the distribution of responses, she saw that almost 
as many students felt the work was "just right" as felt the work was 
"much too difficult." The second interpretation focused on the diversity 
of student ability in the classroom. To improve har teaching, she began 
to individualize instruction so that all of her students would be able to 
do some things well. 

General disagreement was produced between the intern teachers and 
their high school students by another interesting question: "How impor- 
tant to you is having the teacher like you?" Secondary students rarely 
reported that this was either "extremely" or "very" important. The 
secondary interns seemed a little hurt and surprised by their students' 
indifference. This finding generated a very fruitful discussion among 
int<jns. It led to admissions that they were probably upset by this stu- 
dent report because they wanted so much to be liked by their own students. 
They had just assumed that liking was reciprocal. They confided to one 
another that wanting to be liked sometimes interfered with their better 
judgment as teachers. This conclusion was incorporated into their over- 
all plans for improvement. 

By comparison, elem^tary teachers were a little overwhelmed at their 
students' rating of the:^ teacher's importance in their lives. Almost all 
elementary students said it was "extremely important" to be liked by their 
teacher. Of course, these veteran teachers had suspected that their stu- 
dents wanted their affection, but they had not known how strong or how 
widespread this feeling was. Such unanimity in their students' responses 
made them sensitive to a number of related behaviors in the classroom. 
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For example, after reviewing the questionnaire but prior to observation, 
one teacher noted about another, "Those kids are always touching you, and 
you never fail to respond." 

In addition to insights gained from students' responses on each item 
of the questionnaire, teachers discovered that examining the responses 
on several items at once sometimes revealed interesting patterns. For 
exasiple, one teacher discovered that her students reported being more 
confused than she had suspected. They agreed that the teacher's directions 
were unclear and that they were seldom encouraged to ask questions. She 
felt that their confusion might be alleviated if she took measures to 
clarify her directions and encouraged them to ask questions whenever they 
were confused. 

Although anonymity was ensured on the student questionnaire, teachers 
and interns spent a lot of time guessing which students had given certain 
responses. The elementary teachers, who knew their students much better 
than the interns, seemed confident of their ability to make these guesses. 
When one student responded that he "never received good grades" even when 
he did "good work," the teacher said, "I know who that is, and he's right. 
We've got to start giving him some rewards for his efforts." The teacher 
was confident that this was the same student who responded that the teacher 
never let him know when he was doing "good work." 

The participants agreed that maintaining anonymity was important it 
they wanted honest responses from students, but one lamented that "it 
would be valuable to know a particular student whose answers were radically 
different. It may be that this student is having difficult problems that 
I have overlooked or that are not obvious to me, and I would want to give 
him the special help that might be needed." 

In the pilot test, one of the interns did a fine job of developing 
his own student questionnaire. He wanted to obtain specific information 
about his skills as a choir director. He learned that his conducting was 
"fairly easy to follow," but almost half of his students felt that he 
"stayed on one piece of music too long." Most of the choir liked the 
music "O.K.," with just a few liking it "a lot" or "not much." Only two 
students thought he looked like a "madman" when conducting. These items 
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provided an excellent supplement to the laore general student questionnaire. 

Student questionnaires provide teachers with information they cannot 
obtain elsewhere. Only students can tell a teacher vhether or not they 
are interested and comfortable in the classroom. Ihe probleiES students 
perceived were translated into specific criteria for the teacher's col- 
league to observe and were discussed in the conferences. Ihe student 
assessment was a very valuable input that the teachers took into account 
in assessing their strengths and weaknesses and aiaking plans for improve- 
zsent. 

Self-Assessment on Selected Criteria 

In addition to the teacher questionnaire^ participants co]i^>leted a 
self-assessment form based on the criteria they had selected jointly with 
their partners. After their teaching was observed, this self-assessment 
could be compared with the observation form to help focus the conference 
on areas for improvement. OveralS ^ participants were usually much more 
critical of themselves , both in ratings and in negative comments , than 
their colleagues were. They generally agreed wi.2h their partners' ob- 
servations on areas of weakness , and most spent their conference in swap- 
ping ideas for improvement rather than in resolving disagreements. 

A colleague's agreement uas helpful in legitimatizing a teacher's 
perception of her strengths and weaknesses. For example, one teacher 
commented, "In discussion, I .end to rely on the same students who always 
have the answers, and I do not phrase open-ended questions to include 
everyone." When her colleague noted that "two boys spoke often, a few 
girls spoke occasionally, but no one else entered the discussion," her 
self-assessment was confirmed. A good part of their first conference 
focused on how she might increase student participation. In the second 
observation her colleague noted that "the discussion included more students 
and some who had not previously participated. You praised the newcomers- 
Good." 

In her self-assessment another teacher noted a need for "some improve- 
ment" in lectures because she "relied too heavily on note cards." During 
the first observation her colleague identified the same area: "The organiza- 
tion and sequence of the lesson is good, but you occasionally stopped to 
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refer to notes." At the second observation the problesi vas not as severe 
and the colleague observed, "You relied on notes much less." 

Of course, not all of the problems were so easily reiaedied. In a 
self-assessssent one teacher reported the need "to project xny voice." Her 
colleague noted, "Teacller's quiet voice tends to trail off** on the first 
observation form. In the second observation period the colleague reported, 
"Teacher's voice does not carry above sound of the slide projector." 
This is clearly a problem that needs to be addressed in that teacher's 
inprovexaent plan. 

Ihe Iiaprovement Plan 

Developing a plan for ls^>roveaent is the most in^ortant step of the 
collegial evaluation process. But the quality pf each teacher's plan 
depends on how well the other steps have been carried cut. The plan for 
iisprovement is formilated in a final "wrap-up" conference between the two 
partners. Each teacher integrates all the information he she has re- 
ceived from self-assessmeut, student questionnaires, and peer evaluation, 
and presents his partner with a composite list of sti'>ngchs and weaknesses. 
Together the teachers decide on the specific strategies each will use to 
improve their teaching performance in areas of weakness. In addition, 
they determine how they will evaluate the results of these strategies. 
Finally, they identify any resources they will need to carry out their 
improvement plan. 

In our pilot test of collegial evaluation, the iinprovement plans 
spanned the whole range of teaching activities: presentation of subject 
matter, classroom control, motivation, student interest and involvement, 
positive reinforcement, and classroom organization and atmosphere. The 
Improvement plans were based on evaluations that shewed a remarkable 
amount of agreement between the teachers themselves, their colleagues, 
and their students. In most cases a teaching weakness identified by one of 
these sources was corroborated by the others. 

For example, one teacher listed as an evaluation criterion, "Do not 
ignore any segment of the class concerning questions or needs — give attention 
equally." On the student questionnaire several students reported that 
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£hey vere "seldom" or "nevej:** encouraged to ask questioxis in class* On the 
basis of classroos observation the teacher's partner noted: "iltie less 
capable students are not involved, especially those at the back." As 
part of the plan for improvement, the teacher specified, ''With che help of 
ny peer, I vill first identic those students uhoa I have igzk>red« I 
vill zoake a point of talking to each of then every day. I'll keep a check 
list to siake sure I spend soine time vith each of these children." Another 
teacher developed a plan to deal vith a siznilar interaction problem in a 
different way. To encourage the nonpar ticipators at the back, she decided 
to rearrange the c l ass and laove the pupils at the back into the first two 
rows. She also said that she would "give those individuals who have not 
been participating responsibility for explaining things to the class and 
helping others with their work." 

An intern chose as an evaltiation criterion, "I present subject matter 
at a level appropriate to student ability." He was perplexed when most 
of his students reported on the questionnaire that they were confused by 
his explanations. Then his peer commented, "You use a lot of terns which 
go way over sozae of these kids' heads." In his improvement plan this 
intern listed a number of specific strategies to overcome the problem. 
Asong these were: "I will try to define clearly all new terms which I 
use in class and be more careful to write these terms and their definitions 
on the board. I'll use pretests to determine pupil knowledge in the sub- 
ject area. For those who do well on these tests I will design self- 
directed projects. This will leave me free to spend more time with the 
slow-achievers . " 

Some of the improvement plans called for relatively minor changes; 
others envisioned a major reorganization of the classroom and substantial 
changes in teacher behavior. Two of the elementary school teachers felt 
that they both needed to maintain a quieter learning environment. Such 
a concern is not atypical in open-space classrooms. After observing one 
another, they discovered that the noisiest time of the day came when they 
grouped their students by ability in math and language arts. The noise 
came from the "low ability" youngsters, and it prevented them and others 
from concentrating. As part of their improvemant plan, the teachers 
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decided that the next year they vould e3g>eri2neat ^th more heterogeneous 
groups. 

Many of the identified vealcnesses vere not so difficult to renedy. 
For exastple, one art teacher, concerned about giving appropriate positive 
reinforceiaent for good work, benefited fron his colleague's observation that 
he did not have any student vork displayed in the classroon. He planned to 
"reserve a large space in the art roon, school library, and hall display 
cases for the exhibition of student vork," Another intern, vhose problen 
was that he never had tiane to finish his lesson, decided to save a few 
aninutes each period by letting studerts distribute and collect classroom 
materials rather than doing it hiaiself . 

For each of the specific strategies, teachers vere asked to determine 
how chey would assess their progress* Plans for assessinent vere as varied 
as ijnproveaent strategies. Teachers planning to iacprove their presentation 
of subject z&atter often relied on student cognitive cutcostss as a measure 
of their success. The teacher aentioned above, who planned to explain and 
define new terms more carefully, listed as one indicator of progress the 
nuinber of times students used the new tents in their essays. 

Several teachers decided to use the student questionnaire as a post- 
test device to assess their improvement. Coaparing the student response 
before and after the improvement plan was put into effect would help them 
assess their progress in such areas as motivating students, evaluating them, 
presenting material clearly, individualizing subject matter, displaying 
interest in students, and developing material appropriate to the students' 
level. 

Almost all of the teachers planned to use collegial observation and 

conferences as a method of assessing their improvement- Many had already 

set up times to begin another round of observations with their colleagues. 

Others decided to change partners. The specific strategies for improvement 

would suggest new criteria for the next round of observations- One of the 

most gratifying results of the pilot test was that many of the participants 

considered our collegial evaluation program so useful that they planned to 

extend it throughout the school year- As one teacher said, 

I need to have this kind of collegial evaluation on 
a regular basis. If my colleague evaluated me 
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throughout the year, she vould have an tinder- 
standing of the trends in iiy teaching and in a 
particular class and the evaluation vould be even 
aDore helpful. She vould be able to detect subtle 
problem areas that I aay not be avare of • I could 
do the sane for her and also continue to learn a 
lot by observing another teacher at vork. 

Conclusions 

We began this discussion by criticizing traditional approaches to 
teacher evaluation and advocating collegial evaluation as an alternative. 
We suacaarized research revealing that teacher evalixation programs are all 
veak in one or aore steps of the evaluation process. According to teachers 
and administrators we have interviewed, criteria for observation are usually 
vague or unknown, observations are infrequent, useful feedback is rare, and 
plans for teacher iznprovement are alioost nonexistent. The experiences of 
teachers in the pilot test of our collegial evaluation prograr: gave us some 
evidence for assessing this approach and comparing it with laore traditional 
loethods of evaluating teachers. 

Most important, we learned that teachers can and will help each other 
perform better on their jobs. We also learned that teachers will take 
students' assessments of their teaching seriously and use them in develop- 
ing plans for improvement. 

We found that the most difficult step of our program was selecting 
criteria to serve as a basis for evaluation. But most teachers did select 
some criteria that were specific, observable, and meaningful to them. We 
also learned that thinking about their criteria helped teachers assess not 
only where they might need to improve but what their goals as teachers were. 

We emphasized thai the steps of the evaliiation program are interdepend- 
ent and that a weakness in any one of them would diminish the program's 
usefulness. This was especially apparent in reviewing improvement plans. 
If the criteria were specific, observable, and meaningful, if the observer 
was attentive and carefully reported observations to his or her colleague, 
and if the feedback exchanged was complete and honest, then the improve- 
ment plan generated by the pair of teachers was a thoughtful and practical 
blueprint for professional growth. The message is clear; teachers cannot 
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participate in this prpgran in a half-hearted saanaer. If they are to use 
it as a neans for laproving their teaching, they aiust cosndt thenselves to 
doing a thorough and careful job at every step« 

Does collegial evaluation work? «e believe the ansver is yes- Based 
on our pilot test we have concluded that collegial evaluation is a useful 
approach to teacher evaluation in schools. On the whole, teachers reacted 
favorably to collegial evaluation, adapted the program to fit their unique 
circunstances, and gained new ideas for istproving their teaching* 



31 



-26- 



References 



Dombusch, S- M. , and Scott, W. R. Evaluation and the Exercise of 
Authority ^ San Francisco: Jossey-Bass, 1975. 

Marraa, G. D. , Dornbusch, S. H., and Scott, R. The laapact of Teaning 
and the Visibility of Teaching on the Professionalism of Eleaentary 
School Teachers. (Stanford Center for Research and Development in 
Teaching, Technical Report No. 33) Stanford University, I>eceaber 1972. 

Thcapson, J. E., Dombusch, S. M. , and Scott, W. R. Failures i>f 

Cozaounication in the Evaluation of Teachers by Principals. (Stanford 
Center for Research and Development in Teaching, Technical report 
No. 43) Stanford University, April 1975. 



32 



